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Abstract 

A natural requirement of many distributed structures is fault-tolerance: after some failures, 
whatever remains from the structure should still be effective for whatever remains from the 
network. In this paper we examine spanners of general graphs that are tolerant to vertex 
failures, and significantly improve their dependence on the number of faults r, for all stretch 
bounds. 

For stretch fc > 3 we design a simple transformation that converts every fc-spanner con- 
struction with at most f{n) edges into an r- fault-tolerant fc-spanner construction with at most 
0{r^ logn) •/(2n/r) edges. Applying this to standard greedy spanner constructions gives r-fault 
tolerant fc-spanners with 0(r^?T,^^'^) edges. The previous construction by Chechik, Langberg, 
Peleg, and Roddity [STOC 2009] depends similarly on n but exponentially on r (approximately 
like fcO- 

For the case k — 2 and unit-length edges, an 0(r log n)-approximation algorithm is known 
from recent work of Dinitz and Krauthgamer [arXiv 2010], where several spanner results are 
obtained using a common approach of rounding a natural flow-based linear programming relax- 
ation. Here we use a different (stronger) LP relaxation and improve the approximation ratio to 
O(logn), which is, notably, independent of the number of faults r. We further strengthen this 
bound in terms of the maximum degree by using the Lovasz Local Lemma. 

Finally, we show that most of our constructions are inherently local by designing equivalent 
distributed algorithms in the COCAC model of distributed computation. 
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1 Introduction 



Let G = {V,E) be a graph, possibly with edge-lengths £ : E ^ M>o. A k-spanner of G, for A; > 1, 
is a subgraph G' = (V, E') that preserves all pairwise distances within factor k, i.e. for all u,v , 



dG'{u,v) < k ■ dG{u,v) 



(1) 



Here and throughout, dn denotes the shortest-path distance in a graph and n = \V\. The 
distance preservation factor k is called the stretch of the spanner. It is easy to see that requiring 
dl]) only for edges (n, v) ^ E suffices. This definition also extends naturally to directed graphs. 
Obviously G is a 1-spanner of itself, so usually the goal is to compute a "small" spanner. Two 
traditional notions of "small" are the number of edges in G' (called the size of G'), and the weight 
of G' (where the weight of a graph is the sum of the lengths of the edges in the graph). If every 
edge has unit length then these two notions are the same, but for more general edge lengths they 
can be quite different. 

This notion of graph spanners, first introduced by Peleg and Schaffer [PS8£] and Peleg and 



UUman [ PU8£ ], has been studied extensively, with applications ranging from routing in networks 
(e.g. [1AP95| , |TZ05| ) to solving linear systems (e.g. jSTO^ , |EEST08|| ). Many of these applications, 
especially in distributed computing, arise by modeling computer networks or distributed systems as 
graphs. But one aspect of distributed systems that is not captured by the above spanner definition 
is the possibility of failure. We would like our spanner to be robust to failures, so that even if 
some nodes fail we still have a spanner of what remains. More formally, G' is an r-fault tolerant 
/c-spanner of G if for every set F (IV with \F\ < r, the spanner condition holds for G\F, i.e. for 
all u,v G V \ F we have dQi\p{u, v) < k ■ dQ\p{u, v). 

This notion of fault-tolerant spanners was first introduced by Levcopoulos, Narasimhan, and 



Smid [LNS95] in the context of geometric spanners (the special case when the vertices are in 
Euclidean space and the distance between two points is the Euclidean distance). They provided 
both size and weight bounds for (1 + e)-spanners, which were later improved by Lukovski p^uk99| 
and Czumaj and Zhao [ PZ03 |. The first result on fault-tolerant spanners for general graphs, by 
Chechik, Langberg, Peleg, and Roditty [ CLPROE ], constructs r-fault tolerant {2k — l)-spanners 



with size 0{r'^k^~^^ ■ ri^+^l^ log^ n), for any integer A; > 1. Since it has long been known how to 



construct {2k — l)-spanners with size 0{n^~^^/^) (see e.g [ADD^93]), this means that the extra cost 
of r-fault tolerance is 0{r'^k^^^). While this is independent of n, it grows rapidly as the number of 
faults r gets large. We address an important question they left open of improving this dependence 
on r from exponential to polynomial. 

Nontrivial absolute bounds on the size of a /c-spanner are possible only when the stretch A; > 3. 
For k = 2, there are graphs with O(n^) edges for which every edge must be included in the span- 
ner (e.g., a complete bipartite graph). So the common approach is to provide relative bounds, 
namely, design approximation algorithms for the problem of computing a minimum size/weight 
r-fault tolerant 2-spanner. In this context one assumes that all edges have unit length, so the 
size equals the weight. Without fault tolerance, the problem is reasonably well understood: there 
are algorithms that provide an 0(log n)-approximation [KP94, EP01| (or, with some extra ef- 
fort, an 0(log(|£'|/|l/|))-approximation), and the problem is NP-hard to approximate better than 
f](logn) [ Kor01| . For the r-fault tolerant 2-spanner problem, Dinitz and Krauthgamer | DK1(| 
recently gave an 0(r log n)-approximation. However, they did not provide evidence that this loss 
of r was necessary, an issue that we address in this paper. 
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1.1 Results and Techniques 

Stretch bounds A; > 3. Here, our main result is a new r-fault tolerant A;-spanner with size that 
depends only polynomially on r, thereby improving over the exponential dependence by Chechik 
et al. [PLPR09|| . 



Theorem 1.1. For every graph G = {V^E) with positive edge-lengths and odd k > 3, there is an 
r-fault tolerant k-spanner with size 0{r'^~'^n^'^'^ logn). 

In fact, we prove something slightly stronger: a general conversion theorem that turns any 
algorithm for constructing /c-spanners with size f{n) into an algorithm for constructing r-fault 
tolerant /c-spanners with size 0(r'^logn • /(2n/r). Applying this conversion to the well-known 
greedy spanner algorithm (see e.g. [ ADD^QSl ) immediately yields Theorem 



At a high level, Chechik et al. |CLPR09[ | apply the spanner construction of Thorup and 



Zwick | TZ05 | to every possible fault set, eventually taking the union of all of these spanners. They 
show, through a rather involved analysis that relies on specific properties of the Thorup-Zwick 
construction, that taking a union over as many as 0{rf) spanners increases the size bound only 
by an 0{r'^k'^) factor. Our conversion technique, on the other hand, is extremely general. Inspired 
by the color-coding technique of Alon, Yuster, and Zwick | AYZ95|| and its recent incarnation in 



designing data structures and oracles | WY10| ], we randomly sample nodes to act as a fault set, and 



then apply a generic spanner algorithm on what remains. Our sampling dramatically oversamples 
nodes — instead of fault sets of size r, we end up with sampled fault sets of size approximately 
(1 — ^)n. This allows us to satisfy many fault sets of size r with a single iteration of the generic 
algorithm. The size bound follows almost immediately. 

Stretch k = 2 (and assuming unit-length edges). Here, our main result is an approximation 
algorithm with ratio that is independent of r. Our algorithm actually works in an even more general 
setting, where the graph is directed and edges have costs Ce : E ^ M>o. The goal is to find an 
r-fault tolerant 2-spanner of minimum total cost. We refer to this problem as Minimum Cost 
r-FAULT Tolerant 2-Spanner. 

Theorem 1.2. For every r < n, there is a (randomized) O {logn) -approximation algorithm for 
Minimum Cost r-FAULT Tolerant 2-Spanner. 



This improves over the previously known 0(r log n)-approximation | DK10|| . Similarly to |DK10| 



we design a flow-based linear programming (LP) relaxation of the problem and then apply a round- 
ing scheme that uses randomization at the vertices, rather than naively at the edges. However, the 
relaxation used by | DK10| ] is not strong enough to achieve approximation factor independent of 



r; 



even simple graphs (such as the complete graph with unit costs) have integrality gaps of Vl{r). We 
thus design a different relaxation, and add to it a large family of constraints that are essentially 
the knap sack- cover inequalities of Carr, Fleischer, Leung, and Phillips |CFLPOO |, adapted to our 



context. With these additional constraints, we are able to show that the simple rounding scheme 



devised in | DK10 | now achieves an 0(logn)-approximation. 

We further show that the integrality gap is at most 0(log A), where A is the maximum degree 
of the graph, in the special case where all edge costs are 1. Note that this bound is at least as good 
as the O(logn) bound (and possibly better). We prove this by a more careful analysis of essentially 
the same randomized rounding scheme using the Lovasz Local Lemma. This makes the result 
non-algorithmic - it only shows that the rounding scheme succeeds with a positive probability. 
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Distributed versions of our algorithms. Finally, one feature that is shared by both the k = 2 
and the A; > 3 case is that the algorithms are local (assuming that the generic algorithm used by 
the conversion theorem is itself local). To show this formally, we provide distributed versions of 
the algorithm in the COCAC model of distributed computation. The COCAC model is a standard 
message-passing model in which in each round, every node is allowed to send an unbounded-size 
message to each of its neighbors [PelOC]. While the unbounded message-size assumption may not 



be realistic, this model captures locality in the sense that in t rounds, each node has knowledge of, 
and is influenced by, only the nodes that are within (hop-)distance t of it. 

Assuming that the underlying generic spanner algorithm is distributed in this sense, our general 
conversion theorem trivially provides a distributed algorithm since the failure sampling is done 
independently by every edge. Designing a distributed version of the r-fault tolerant 2-spanner 
algorithm is not quite as simple, since our centralized algorithm uses the Ellipsoid method to solve a 
linear program that has an exponential number of constraints. While there is a significant amount of 
literature on solving linear programs in a distributed manner, much of the time strong assumptions 
are made about the structure of the linear program. In particular, it is common to assume that the 
LP is a positive (i.e. a packing/covering) LP. Unfortunately the LP relaxation that we use is not 
positive, even for r = 0, so we cannot simply use an off-the-shelf distributed LP solver. Instead, 
we leverage the fact that the LP itself is "mostly" local — we partition the graph into clusters, 
solve the LP separately on each cluster, and then repeat this process several times, eventually 
taking the average values. This technique is quite similar to the work of Kuhn, Moscibroda, and 



Wattenhofer |KMW06|, who showed how to approximately solve positive LPs using the graph 



decompositions of Linial and Saks [LS93|. We construct padded decompositions using a variant of 



the methods developd by Bartal [Bar96| and by Linial and Saks [LS93|. Combining this distributed 
methodology for solving the LP relaxation together with the obvious distributed implementation of 
the aforementioned rounding scheme, we obtain the following distributed 0(logn)-approximation. 

Theorem 1.3. There is a randomized algorithm that takes 0(log^ n) rounds and gives an O(logn)- 
approximation for Minimum Cost t-Fault Tolerant 2-Spanner in the COCAC model of dis- 
tributed computation. 



2 General k 

In this section we give our construction of r-vertex-tolerant A;-spanners (with arbitrary edge- 
lengths). For each F QV with \F\ < r, we let Ep denote the edges of G\F, i.e. Ep = {{u, v} (z E : 
u,v ^ F}. We first give a general conversion theorem that turns any /c-spanner construction into an 
r-fault tolerant A:-spanner construction at an extra cost of at most poly{r) ■ logn. This conversion 
actually works fine even when the underlying spanner construction is randomized, but since good 
deterministic constructions exist we will assume for simplicity that the underlying construction is 
deterministic. We say that an event happens with high probability if it happens with probability 
at least 1 — ^ for constant C that can be made arbitrarily large (at the cost of increasing the 
constants hidden by O(-) notation). 

Theorem 2.1. If there is an algorithm that on every graph builds a k-spanner of size f{n), then 
there is an algorithm that on any graph builds with high probability an r-fault tolerant k-spanner of 
size 0(r^ log n • /(^)). 
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Proof. Our algorithm is simple: in each iteration, we independently add each vertex to a set J with 
probability p = 1 — 1/r, and then use the given algorithm to build a fc-spanner on the remaining 
graph G\J. If r = 1 then we can set p = 1/2, which will just affect the constants in the O(-). We 
do this for a = 0(r'^ logn) iterations, each independent of the others. Let H be the graph obtained 
by taking the union of the iterations. 

We first bound the size of H. Without loss of generality we can assume that r < n^^^, since 
when r > r?!"^ the claimed size bound is larger than and thus trivially true. In each iteration, 
the expected number of vertices in G \ J is n/r. By a simple Chernoff bound, the probability that 
a given iteration has more than 2n/r vertices in G \ J is at most e^^^l'^^'^l'^ < e~(^/^)"^''^ . Since 
there are only a = O(r^logn) < 0(6^'""" logn) iterations, we can take a union bound over the 
iterations and get that with high probability the number of vertices in G \ J is at most 2n/r in 
every iteration. Thus the total size of H is at most 0(a • /(^)). Now we just need to prove that 
this algorithm results in a valid r-fault tolerant /c-spanner for a = 0{r^ logn). 

For each F (^V with \F\ < r, let Ep be the edges in Ep for which the shortest path in G\F 
between the endpoints is just the edge. More formally, Ep = {{u, v} G Ep : dG\p{u, v) = i{{u, v})}. 
It is easy to see that it is sufficient for there to be a path of length at most k ■ dQ\p{u,v) between 
u and V in G\F for every F (^V with |-F| < r and {u,v} G E'p. This is because for a given failure 
set F, if we distort the distances of all remaining edges that are actually part of shortest paths by 
at most k, then we distort the distances of all pairs by at most k (since each edge on the shortest 
path is distorted by at most k). So we consider a particular such F and {u,v} and upper bound 
the probability that there is no stretch-A: path between u and ?; in G \ F. 

Suppose that in some iteration neither u nor v is in J, but all of F is in J. Then since 
{u,v} G Ep, the spanner that we build on G \ J contains a path between u and v of length at 
most k ■ dG\j{u, v) = k ■ £{{u, e}) = k ■ dG\p{u, v). Obviously this path also exists in G \ F, since 
F Q J. So if this happens then H is valid for {u, v} and F. The probability that this happens in a 
particular iteration is clearly (1 — p)'^ ■ p^ , which is at least l/(4r^) as long as r > 2 (if r = 1 then 
this probability it 1/8, which does not significantly affect the results). Thus the probability that 
this never happens in any iteration is at most (1 — ^)" < e""/^*" , so if we set a = ©(r^logn) 
this becomes less than for arbitrarily large constant G. Now taking a union bound over 

all {n, v} and F gives the theorem. □ 

Corollary 2.2. For every graph G = iV^E) with nonnegative edge lengths I : E ^ M>o and every 
odd k > 1, there is a polynomial time algorithm that with high probability constructs a r-vertex- 
tolerant k-spanner with at most 0(r^~ *^n"^^*+i logn) edges. 



Proof. Althofer et al. |ADD^9^ showed that the simple greedy spanner construction has size at 



1 + 

most 0{n '=+!). Applying Theorem 2.1 to this construction completes the proof. □ 



Since Theorem |2.1| applies to any /c-spanner construction, we can apply it to distributed spanner 
constructions. We assume that every node knows r, the desired amount of fault tolerance. 

Theorem 2.3. If there is a distributed algorithm A that on every graph builds a k-spanner of 
size f{n) in t{n) rounds, then there is a distributed algorithm that on any graph builds with high 
probability an r-fault tolerant k-spanner of size 0(r^logn • /(2n/r)) in 0{r^logn ■ t{n)) rounds. 

Proof. The algorithm is simple: 0(r^ logn) times, each edge independently decides whether or not 
to join J with probability 1 — 1/r, and then A is run on the remainder. This obviously takes at 



most 0(r logn • t{n)) rounds, and the analysis of Theorem 2.1 proves the desired size bound. □ 
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Corollary 2.4. There is a distributed algorithm in the COCAC model that in 0{kr^logn) rounds 



constructs with high probability an r -fault tolerant k-spanner with at most 0{kr fc+in 
edges. 



1+ 



fc+i 



logn) 



Proof. Apply Theorem p.3| to the distributed deterministic spanner construction of Derbel, Gavoille, 

. . ^ I 2 

Peleg, and Viennot [pGPV08|| , which has size 0{kn '=+1) and runs in 0{k) rounds. 



□ 



3 Unit-Length r-Fault Tolerant 2-Spanner 

We now move from general k to the specific case of /c = 2. It is easy to see (and has long been 
known) that no non-trivial absolute bounds on the size a 2-spanner are possible, so following 
previous work, we instead consider the approximation version. In this section we will mostly work 
in the directed setting in which every edge e has an arbitrary cost Cg > 0. This is obviously more 
general than the undirected, unit-cost setting considered in Section ^; we can work in this setting 
because of our additional assumptions that k = 2 and edge lengths are unit. Recent work of 



Dinitz and Krauthgamer | DK10 | achieves approximation ratio 0(r log n) for the unit-length r-fault 
tolerant 2-spanner problem, and an 0(r log A) upper bound on the integrality gap (where A is the 
maximum degree). Here we improve these results to 0(log n) and 0(log A) (for all r) via a different 
LP relaxation, and also provide a distributed implementation. 



3.1 The Previous LP Relaxation 



The relaxation in [ DKIC ] uses, at a high level, a characterization of r-fault tolerant 2-spanners 
based on fiows where "for every set of r faults, it is possible to send one unit of (integral) flow from 
u to V along paths of length at most 2 for any edge {u, v) still present in the graph once the faults 
have been removed". More formally, for each {u,v) G E let T'u,v denote the paths of length exactly 
two from u to v, so 'Pu,v U {{u,v)} is the set of all paths of length at most 2. As in Section ||, for 
any possible fault set F Q V with \F\ < r let Ep be the set of edges in E with neither endpoint 
in F. Let V^^ be the subset of Vu,v U {{u,v)} that still survives in Ep. The integer program (IP) 
used by Dinitz and Krauthgamer [ DK10| ] is presented below as IP (||). 



min CeXe 




























S.t. ^ fp <Xe 


\/FCV : 


\F\ 


< r. 


V(n, 


v) € Ep, 


Ve G Ep 


Perl,: eeP 
















ypcv : 


\F\ 


< r. 


V(n, 


v) G Ep 




Pc-pF 

^ ' 11, V 














xe e {0,1} 


VeeE 












/^e{o,i} 


\/F CV : 


\F\ 


< r. 


V(n, 


v) € Ep, 


VP £ 

^ ^ ' u,v 



(2) 



This formulation has capacity variables Xe for every edge e, flow variables fp for every possible 
fault set F and every path P e V^^ U {u,v) (for every {u,v) G E), and constraints that require 
flows to obey the capacities and still send one unit of flow for every possible fault set. Even though 
there are an exponential number of both constraints and variables, it is solvable in polynomial 
time pK10|] . 
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While IP (III) is the obvious integer programming formulation of the r-fault tolerant A;-spanner 
problem, its straightforward relaxation to a linear program is not strong enough to give an approx- 
imation that is independent of r (despite having an exponential number of both constraints and 
variables). An easy way to see this is by considering the complete graph. On the complete graph, 
every vertex obviously needs at least r incoming and outgoing edges, or else it could be isolated 
with less than r faults. So on Kn the optimum spanner has size at least rn. On the other hand, 
when we relax the integrality constraints we can set the capacity of every edge to l/(n — r — 2) 
and still have enough capacity to send one unit of flow from any vertex to any other even after r of 
them have failed. So the linear program has cost of only r? jiji — r — 2), which is 0{n) as long as 
r < cn for some constant c < 1. Thus the integrality gap of the relaxation is J7(r) for an extremely 
wide range of r. 



3.2 A New LP Relaxation 

To get around this problem, we will use a different relaxation based on weighted flow. Before we give 
our formulation, we first prove a simple and useful characterization of r-fault-tolerant 2-spanners: 

Lemma 3.1. For any (directed) graph G = (V^E), a subgraph H = {V,E') is an r-fault tolerant 
2-spanner if and only if for every {u,v) in E either {u,v) € E' or there are at least r + 1 paths of 
length 2 from u to v in E' 

Proof. Let H be an r-fault tolerant 2-spanner of G, and for the sake of contradiction assume that 
there is some (n, v) (z E that is not in E' and for which there are at most r paths of length 2 from 
u to V. Let W C. V he the vertices that are the midpoints of these paths. Then if we let our fault 
set F be W, in the remaining graph H \ W there is no u — v path, while in G\W the edge (n, v) 
still exists. Thus H is not an r-fault tolerant 2-spanner, giving the contradiction. 

For the other direction, suppose that for every (n, v) ^ E either [u, v) G E' or there are at least 
r + 1 paths of lengths 2 from u to v. Let F (IV with \F\ < r be some fault set. We need to show 
that H \s a. valid 2-spanner for G\F, so let (u, v) (z E with u,v ^ F he an arbitrary edge in G\F. 
If {u, v) G E' then obviously H preserves its distance exactly, and if (u, v) E' then by assumption 
there are at least r -|- 1 paths from tt to of length 2 in At most r of the intermediate vertices 
on those paths can be in F, so in G \ -F there is at least one such path remaining. □ 

With this lemma in hand, it is easy to see that the following integer program is an exact 
formulation of the r-fault tolerant 2-spanner problem. 



min CeXe 












S.t. ^ fp <Xe 


y{u,v) G E, 


VeeE 


PePu,v:e€P 






(r + l)x(„,^)+ Yl fp>r + l 


\/{u,v) G E 




PePu,v 






Xe € {0, 1} 


ye£E 




/p >o 


y{u,v) G E, 


VP G Vu,v 



(3) 



So now we have a different IP formulation than the one that was used in pK10| | to get an 
0(r log n)-approximation. Unfortunately, it is still not strong enough to yield an approximation 
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ratio independent of r; there are still simple examples that give an integrality gap of i}(r). For 
example, consider a graph with nodes u and v and an edge of cost M from u to v (for some 
arbitrarily large M), together with r nodes wi, . . . ,Wr and an edge of cost 1 from u to Wi and from 
Wi to V for all « G [r]. The set of all Wi nodes is a valid fault set, so the optimum spanner needs to 
include the (n, v) edge in order to still be valid. So the optimum spanner has cost at least M. On 
the other hand, the LP can set Xe to 1 for all edges e incident on some Wi, and set X(^u,v) = !/(?' + 1). 
This has cost of only M/{r + 1) + 2r. By setting M large enough, we get a gap of Q{r). 

We will strengthen the relaxation by adding a set of valid inequalities that are essentially the 
knapsack- cover inequalities of Carr et al. []CFLPOO | applied to this IP. Let {u,v) € E, and consider 



some arbitrary subset W C V^^^ with \W\ < r. If = 0, then the covering inequality for {u,v) 
implies that '}2peVu ^ fp — ^ ^ thus J^Pev^ ^\w fp^^ + ^~ 1^1- On the other hand, if 

X(u,v) = 1 then clearly (r + 1 — |Ty|)x(^i_„) > r + 1 — \W\. So for ah {u,v) G E and all W C P„ j, 
with \W\ < r, we can add the constraint that (r + 1 — + J2peVu v\w /p ^ + 1 ~ 1^1- 

These are the knapsack-cover inequalities, and when we add them to our IP formulation and relax 
the integrality constraints we get the following LP relaxation: 



min CeXe 
















S.t. ^ fp <Xe 




y{u,v) e E, 


ye£E 


PePu,v.eeP 








(r + 1- |Ty|)x(„,„) 


P€Pu,v\W 


y{u,v) G E, 


yW C :\W\<r 


< Xe < 1 




yeeE 




/p >o 




y{u,v) G E, 


VP G Vu,v 



(4) 

We refer to the first type of constraints as capacity constraints, the second type as knapsack- 
cover constraints (or inequalities), and the third as multiplicity constraints. This relaxation has a 
polynomial number of variables but a possibly exponential number of constraints, so we first need 
to show that we can solve it. To do this we construct a separation oracle, which allows us to solve 
it in polynomial time by using the Ellipsoid algorithm. 

Lemma 3.2. There is a polynomial time algorithm that solves LP (^. 

Proof. We want to construct a separation oracle. Note that there are only a polynomial number of 
capacity constraints and multiplicity constraints, so we can check them all in polynomial time. To 
find a violated knapsack-cover inequality, note that if there is some {u, v) & E and some W C Vu^v 
that violates the inequality, then the set W which consists of the \ W\ paths in 'Pu,v with the largest 
fp value also violates the inequality. So for every (u, v) G E, for every k G [0, r], it suffices to check 
the constraint for (n, v) and the k paths in Vu^v with largest flow. Since r < n, this takes only 
polynomial time. □ 

3.3 O(logn)-approximation 

We now give the main result of this section. 
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Theorem 3.3. There is a randomized 0{logn)-approximation for Minimum Cost t-Fault Tol- 
erant 2-Spanner (for all r). 



Proof. The first step of the algorithm is to solve LP (Q) using Lemma 3.2. We then round the 



solution using Algorithm |l] below. (This rounding algorithm was designed in |DK10|, but for a 
different relaxation, hence they were forced to set a = 0(rlogn) and the analysis therein is not 
applicable here.) 

Algorithm 1: Rounding algorithm for r-fault tolerant 2-spanner. 

1 Set a = Clnn (for a large enough constant C). 

2 For every v choose independently a random threshold T„ S [0, 1]. 

3 Output E' = {{u,v) € E : mm.{Tu,Ty} < a ■ Xu,v}- 

We first show that the cost of the solution is likely to be at most 6a times the LP value. The 
probability than some edge e is selected to be in E' is at most 2axe, so the expected cost of the 
solution E' is X^egs^e ■ 2axe = 2a'^^CeXe- By Markov's inequality, the cost of the solution E' 
exceeds 6a CgXe with probability at most 1/3. 

We now argue that this algorithm returns a valid r-fault tolerant 2-spanner with high probability. 
We say that E' satisfies an edge (u, v) if either (tt, v) € E' or E' contains at least r+1 length 2 paths 
from u to V. By Lemma |3.1| , if E' satisfies all edges then it is a valid r-fault tolerant 2-spanner. 
Consider some edge (n, v) G E. Order the paths in 'Pu,v in nonincreasing order by their flow values 
in the LP solution, so Pi is the path with the ith largest flow. Let Wi = {Pi,P2, ■ ■ ■ ,Pi}, and let 
i* = max{i : /p. > 1/a}. If i* > r then r + 1 paths have flow value at least 1/a, so both of the 
edges in each path have x value at least 1/a, so they are included in E' with probability 1. Thus 
{u,v) is satisfied with probability 1. 

On the other hand, suppose that i* < r. Let us denote r' = r+1 — i* > 1. By the knapsack-cover 
constraint for (u, v) and Wi* , we know that 



r'x(„ „\ + ^ fp >r' 



If r'xi^u,v) ^ ^'/^ then xi^u,v) > 1/2 and thus (u, v) is included in E' with probability 1, satisfying 
{u,v). Otherwise it must be the case that ^p^-p^ ^\\y.^ fp > r'/2. For P € Vu,v, let Ip be an 
indicator for the event that the T value of the middle vertex is at most a times the flow value fp 
(formally, if P = {u,z,v) then Ip = Ix^^afp), and observe that this event implies that both edges 
of P are included in E' (because then we have < min{x(„^2), i,)}). Note that for P G Wi*, 
we have Ip = 1 with probability 1. For P G Vu,v \ Wi*, we have Ip = 1 with probability at least 
afp G [0, 1]. The number of paths from Vu,v\Wi* included in E' is clearly at least X^pg-p^ ^\w* 
and we can bound that last quantity (which is a sum of independent indicators) by a Chernoff 
bound (see e.g. [[MR9E , DP09|| ). Its expectation is 



E 



^ Jp] > Yl "^fp ^ «^'/2, 

Peru,vW^* PePu,v\w,* 



so by our choice of a = Clogn for a large enough C, 



Pr 



E 

PeVu,v\w,* 



Ip < ar'/A < e-^("^') < l/n^(^) < l/n^. (5) 
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Thus, with high probabihty the total number of length 2 paths between u and v included in E' 
is at least i* + ar'/4 > r + 1, and thus {u,v) is satisfied. The theorem follows by taking a union 
bound over these events for all edges {u,v), and the aforementioned event that the solution's cost 
exceeds 6a times the LP value. □ 



3.4 Bounded-Degree Graphs 

When the maximum degree of the graph is bounded by A and the edge costs Cg are all 1, we can 
improve Theorem p.3| slightly and give an 0(log A)-approximation. We simply change the inflation 
parameter a in Algorithm |^ to be O(logA) instead of O(logn). We then need a more careful 
analysis, using an algorithmic version of the Lovasz Local Lemma. 

Theorem 3.4. There is a (randomized) O (Jog A) -approximation for the (directed) r-fault tolerant 
2-spanner problem on graphs in which Cg = 1 for all e G E and the maximum (in and out) degree 
is at most A > 2. 

We shall use the following constructive version of the symmetric Lovasz Local Lemma, which 



is an immediate corollary of the nonsymmetric version proved by Moser and Tardos | MT10 | 



Lemma 3.5 (Moser and Tardos | MT10| ). Let V be a finite set of mutually independent random 



variables in a probability space. Let A be a finite set of events determined by the variables in V . 
Suppose that each A G A is mutually independent of all but at most d other events in A, and 
suppose that Pr[A] < p for all A £ A. If ep{d + 1) < 1 then there exists an assignment of values to 
the variables V such that no event A £ A occurs. Moreover, there is a randomized algorithm that 
finds such an assignment in expected time 0(|'P| + |^| • [Vl/d). 



Proof of Theorem \3.4i - Consider a directed graph G with unit edges costs Ce = 1 and vertex degrees 
bounded by A. Consider a solution to the LP relaxation and apply Algorithm || to it but with 
inflation factor a = C log A instead of C log n. 

For an edge (n, v) G E, let Au^v be the event that E' does not satisfy this edge, i.e. (n, v) E' 
and the graph G' = (V, E') has less than r + 1 paths of length 2 from u to v. The analysis of 



Theorem 3^ shows (after modifying (^) with our new value of q), that 



Furthermore, note that Au^^ depends only on the random variables T^ for z € {N^ {u)riN^ {v))U{u} . 
Here and throughout, N~^{u) and N~{u) denote the out-neighbors and in-neighbors oi u € V, 
respectively. Observe that A^^v is independent of all but A^ other events A^'y, simply because 
there are at most A choices for each of z, n', and v' . 



We could now apply Lemma 3.5 to these events. The underlying mutually independent random 
variables V would be the Tu variables, and the "bad events" A would be the events A^^v This 
would give us an algorithm that in polynomial time returned a valid r-fault tolerant /c-spanner, 
but we also need a bound on the cost of this spanner. The analysis via Markov's inequality in 



Theorem 3.3 is too weak now, because when we apply the algorithm of Lemma 3.5 we change the 
overall distribution in a way that might destroy the cost bound. We need to integrate the cost 
analysis into the events that Lemma |3.5| is applied to, so at a high level we employ a more local 
approach where the cost of E' is split among the vertices and events bounding the cost are added 
to the Au V events. More specifically, we shall create many events, each of which controls how the 
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cost of E' compares locally with the cost of the LP, and then apply the Local Lemma to the new 
events together with the {^n,i)} events. A formal argument follows. 

For each vertex u (z V, let the random variable be the number of outgoing edges {u, v) for 
which Ty < a ■ Xu,v, and let Z~ be the number of incoming edges (v,m) for which Ty < a ■ Xu,v 
Informally, + Z~ is the number of edges incident to u whose inclusion in E' can be charged 
to their other endpoint. The algorithm's cost is \E'\ < J2uevi^u + ^u)j since every edge (n, 
included in E' adds 1 to either Z+ or Z~ (or both). 

For each vertex v eV, let Bu be the event that Z + + > 4a(X](„,^)6£; + ^{v,u)eE ^v,u)- 
We would like to show that this event happens only with small probability. Note that ]E[Z^] = 
Y,{u,v)eE min{aa;„_t,, 1} < a J2(u,v)&E ^u,v, so by a Chernoff bound (see e.g. [lMR95| , pP09(| ) we get 



Pr 



(u,v)eE 



where in the final inequality we assume there is at least one outgoing edge from u and thus 
S(m i;)G_E — 1 (since otherwise Z^ = with probability 1). Using a similar argument to 
bound Z~ , we get 



Fr[By] < Pr 



Zt > 2a 



{u,v)eE 



+ Pr 



Z- > 2a 



{v,u)eE 



< 2A'^'/^. 



We now apply Lemma ^ to the events and A^^v and Bu- Note that By depends only on 
the random variables for z £ N~^(u) U N~{u), and recall that Au^v depends only on for 
z E N^{u) n N^{v). Thus each event is mutually independent of all but O(A^) other events — for 
an event A^ v we exclude at most events Ay_i y/ and at most 2 A^ events B^i ; for an event Bu we 



exclude at most 4A^ events Bu' and at most 2A^ events Au'y. We can thus apply the Lemma 3.5 
with dependency parameter d = O(A^), because by setting sufficiently large C, the probability of 
each event is at most a suitable p = A"^^'^) < l/e{d + 1). Since the number of events is at most 
0(?7-^) and the number of underlying variables is only n, we conclude that there is a polynomial 
time algorithm to find the underlying variables r„ so that none of the events Au^v and Bu occur. 
This implies that G' = {V, E') is an r-fault tolerant 2-spanner of G of cost 

l^'l < (Z+ + Zu) <8a Cu,vXu,v < 0(log A) • LP, 

uGV (u,v)&E 



which proves Theorem 



□ 



3.5 Distributed Construction 



We now show how to adapt and use the 0(log n)-approximation that we designed in Section 3.3 
to give a distributed 0(log n)-approximation. We will assume that communication along an edge 
is bidirectional, even if the graph is directed. The main problem that we run into when trying 
to design a distributed algorithm based on Algorithm |l| is solving the linear program. If we had 
a solution, and every vertex knew the Xe value of its incident edges, then we would be done; the 
rounding scheme in Algorithm |^ is entirely local, so every vertex v would just locally pick its 
threshold and include the appropriate edges. If we want both endpoints of an edge to know that 
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it has been included in the spanner, we can then just have every vertex tell all of its neighbors (in 
a single round) which edges it bought based on its threshold. 

In order to (approximately) solve the LP we partition the graph into clusters, solve the LP 
separately on each cluster, and then repeat this process several times, eventually taking the average. 



This technique is quite similar to the work of Kuhn, Moscibroda, and Wattenhofer [KMW06], who 
showed how to approximately solve positive LPs using the graph decompositions of Linial and 



Saks FS93 |. 

The fundamental tool that we will use is the ability to quickly compute a good padded de- 
composition, which is a basic tool in metric embeddings, but has found numerous applications in 
approximation and online algorithms (e.g. for network design problems). This notion is essentially 
a version of low-diameter decompositions, such as a sparse covers iAPOQ ]. This specific version 



was (probably) introduced by Rao [llao9£], who observed it is can be derived from an earlier con- 
struction of Klein, Plotkin and Rao [KPR93]. An explicit formulation of padded decompositions 
appeared only later, in [KL03, GKL03[| , and used a construction of Bartal | Bar96[| . The definition 
given below is actually a special case of the usual notion, where the so-called padding requirement 
is a unit radius around each vertex, i.e. just the vertex's neighborhood. 

Let T = T(V) denote the set of all partitions of V (irrespective of the graph structure). For a 
partition P S T, we call each set C G P a cluster. Let G' be the undirected graph corresponding 
to G, and define the diameter of C to be diam(C) = ma'x.u^vev dc'{u, v) (this is usually called weak 
diameter, because it corresponds to the shortest u — v path in G' , possibly going out of C along the 
way). Finally, for x (zV and a partition P € T, we let P(x) denote the cluster of P that contains 



X. 



Definition 3.6. A padded decomposition of G is a probability measure ^ on T that satisfies the 
following two conditions: 

1. For every P G supp(/u) and every C € P we have diam(C) < O(logn). 

2. For every x €V we have Prp^^[A^(2;) C P{x)] > 1/2. 

It is known that every metric space admits such a padded decomposition, and there are 



polynomial-time randomized algorithms to sample from such a decomposition [ Bar 96 , FRT04 1 . It 
is convenient to assign to each cluster a vertex, called the cluster center. One could always choose 
an arbitrary vertex in the cluster (e.g. one whose identifier is the smallest). The next lemma is a 



straightforward adaptation of the construction of Bartal |Bar96| to the distributed context; it can 
also be viewed as a slight modification to the graph decompositions of Linial and Saks [LS93]. 



Lemma 3.7. There is an algorithm in the COCAC model that runs in O(logn) rounds and with high 
probability samples from a padded decomposition, so that every vertex knows the cluster containing 
it, meaning all other vertices in the same cluster. Every cluster C also has a cluster center v 
(which is not necessarily in the cluster) with the property that diam(C U {v}) < O(logn). 



Proof. The construction of Bartal p3ar96| ] is simple, and is usually described iteratively. (As men- 
tioned above, the padding property is not formally proved there, but it can be derived from the 
analysis therein, see also |KL03, pKL03[ ). Working in the metric completion of G (so removing 
vertices does not change distances), repeat the following procedure until every vertex has been 
assigned to some cluster: Pick an arbitrary vertex u from those that have not yet been assigned a 
cluster. Randomly pick a radius r„ from the geometric distribution with some constant parameter 
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p > 0. Create a new cluster consisting of u and all unclustered vertices that are within distance 
of u. 

While this procedure is phrased iteratively, it quite obviously can be made distributed with 
only minor changes. First, every vertex u V locally chooses a value from the geometric 
distribution with parameter p. Then every node u simultaneously sends a message containing the 
ID of u to all nodes within distance mm{r u,0 {log n)} of u. Note that this take only O(logn) 
rounds, and with high probability max„{r„} < O(logn) (in fact we could truncate the exponential 



distribution so that ru is always less than O(logn), since the analysis of [KL03] shows that this does 
not significantly affect the padding probability). Now every node chooses as a cluster center the 
sender with the smallest ID (i.e. the sender that comes earliest in the lexicographic ordering) of the 
vertices whose messages it received. The only difference between the output of this algorithm and 
Bartal's algorithm is that in Bartal only unclustered nodes can be the center of a new cluster, while 
in our variation every vertex (in lexicographic order) gets the chance to create a cluster (which it 
might not be a member of itself). It is well known (see e.g. [KL03, pKL03| ]) that this change does 
not affect anything in the analysis. 

We remark that the construction above has a natural choice of cluster centers. Under this 
choice, a cluster C might not contain its center v £ V, but diam(C U {v}) < O(logn), which is 
sufficient for our purposes. □ 

Now that we can construct padded decompositions, we want to use them to decompose LP 
into "local" parts. Let P be a partition sampled from fi. For each cluster C G P, let N{C) denote 
the set of vertices mV\C that are adjacent to at least one vertex in C, let 6{C) C E he all edges 
with one endpoint in C and one endpoint not in C, and let E{C) E he the set of edges with 
both endpoints in C. Let G{C) he the subgraph of G induced by C U N{C). We define LP(C) to 
be LP (^) for G{C), but where edges in 5{C) are modified to have cost 0. 

Let LP* be the value of an optimal solution to LP (^, and let LP*(C) be the value of an optimal 
solution to LP(C). 

Lemma 3.8. X^c'gpLP*(C) < LP* for every partition P £T. 

Proof. Let {x, f) be an optimal fractional solution to LP (^. We want to use this solution to build 
fractional solutions to LP(C) for all C G P whose total cost is at most LP*. For each cluster 
C G P, define a solution {x^ , f'^) for LP(C) as follows: Let = Xe if e G E{G) and let = 1 if 
e G S{C). Note that this already satisfies all of the knapsack-cover constraints for edges in S{C). 
For edges {u,v) G E{C), note that every path in Vu,v appears in G{G), so we can set fp = fp for 
these paths. Since these flows satisfy the knapsack-cover constraints in LP we now satisfy also 
the knapsack-cover constraints in LP(C). All other flows fp (e.g. between vertices in N{C)) are 
set to 0, and obviously the capacity constraints are satisfied, hence (x*^,/*^) is a feasible solution 
to LP(C). 

Since in LP(C) the edges in S{C) have cost 0, and every edge of E is in E{C) for at most one 
cluster C, 

J]LP*(C)=^ Cexf <^CeXe=LP*, 

CeP CePegS(C) e&E 

which proves the lemma. □ 
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We can now give our distributed algorithm for Minimum Cost t-Fault Tolerant 2-Spanner: 



Algorithm 2: Distributed algorithm for r-fault tolerant 2-spanner. 
1 for i 1 to t = O(logn) do 



Sample a partition Pi from fi using Lemma 3.7 ; // we assume the center of each 
cluster C (z Pi knows G{C) 



The center of each cluster C ^ Pi solves LP(C) using Lemma 3.2, and sends the solution 
(x*^'*, /*^'*) to all vertices in C ; 

4 For each edge {u,v) € let 1-{u,v) = • -Pj(^) = -fi(^)} j // these are the iterations 
in which both endpoints are in same cluster 

5 Xe iiiinjl, I ^^gj^ x^'^*^^'*} ; // Pi{e) is the cluster of Pj containing both 
endpoints of e 

6 Round Xe using Algorithm [H ; // each edge is rounded by its endpoints 

Theorem 3.9. Algorithm ^ terminates in O(log^n) rounds and computes (in expectation) an 
O {log n)- approximation to MINIMUM CoST r-FAULT TOLERANT 2-Spanner. 



Proof. We first prove the time bound. Lemma 3/7 implies that sampling from takes only O(logn) 



rounds, and since the diameter of every cluster is at most O(logn) the other two steps of the loop 
also take only O(logn) rounds. Since we execute the loop O(logn) times, the number of rounds 
needed to complete the loop is at most O(log^n). After the loop, each vertex can compute Xe for 
all incident edges e without any extra communication (since each endpoint of an edge e knows Xg 
and the LP values for that iteration). Finally, as already pointed out, the rounding of Algorithm || 
can be done locally, with one extra round used to make sure that both endpoints of an edge know if 
the edge was included by the rounding. Thus the total number of rounds is O(log^n), as claimed. 

To prove that this algorithm returns an 0(log n)-approximation, we will show that with high 
probability the Xe values it computes form a feasible solution to LP (when appropriate flow 



values fp are chosen) of cost at most 0(LP*). Once we have this, the analysis of Theorem 3.3 
implies that the rounding step outputs (in expectation) a spanner G' = (V^,-E") whose cost is 
0(log n) CgXe < 0(log n) LP*, which is clearly an 0(log n)-approximation as asserted in the the- 
orem. To bound the cost, note that the Xe/4 values are just the averages of the LP(C) values for all 
rounds in which the edge e does not have cost 0. In other words, CgXe < | Yl\=i X^cePi LP*(C) < 



4 LP*, where the final inequality is from Lemma 3.5. So it just remains to show that the Xe's form 
a feasible solution to LP (Q). 

To prove this, consider an edge e = {u,v), and let X'^ C Zg be the set of iterations i in which 
N{u) U {n} is all in the same cluster of Pi (where we fix u as one of the endpoints of e in an arbitrary 
manner). By the second property of padded decompositions, the probability that N{u)U{u} is all in 
the same cluster is at least 1/2. The iterations are independent, so a straightforward Chernoff bound 
implies that Pr[|Z^| > t/4] > 1 - l/n^. For a path P G Vu,v. set fp = yJ—^ ^^^j-, 

In other words, the fiow along a path from u to is equal to the average fiow along it in the LP 
solutions that were computed in iterations when N{u) U {u} were all in the same cluster. 

The capacity constraints are obviously satisfied, since each iteration satisfies the capacity con- 
straints, and the edge capacities are scaled by 4/t while fiows are scaled by a factor that can be 
only smaller. Note that here we depend on the fact that all of N{u) is in the same cluster as u; if 
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some vertex z G N{u) were in a different cluster, then in the LP solution for the cluster containing 
u and V there could be flow sent from uiov through z. This flow would not have the corresponding 
capacity added to the Xg variables, which would be a problem. 

Similarly, consider the knapsack-cover constraint for some [u^ v) (z E and some W C 'Pu,v with 
\W\ < r. Then since we could send enough flow in each iteration inli •,, when we take the average 

II — o (u,v) ' 

we can still send enough flow, i.e. 

_1 fPi{u,v),i 



I {u,v) \ P&Pu,v\w 
= r + 1 - \W\, 

where the last inequality is by the knapsack-cover constraint for the cluster Pi{u,v). Thus we have 
a valid LP solution, completing the proof. □ 



Remark: While for our purposes it was enough to solve the LP to within a constant factor (since 
we lose an O(logn) factor in the rounding anyway), it is easy to see that we could in fact solve 
the LP to within a (1 -|- e) factor. First, we could change the padded decomposition to have the 
padding property {u and N{u) are all in the same cluster) to hold with probability at least 1 — e, 
which would require increasing the diameter of the clusters, and thus the number of rounds it takes 
to solve the LP, by an 0(l/e) factor. Second, when we apply the Chernoff bound, instead of asking 
the number of times the padding event occurs to be at least t/4, we could ask that it is at least 
(1 — e)^t. By increasing t by an 0(l/e^) factor, we still get the right probabilities. Overall, the 
number of rounds would now be 0(e~^ logn). 



4 Conclusions and Future Work 

This paper considers the problem of constructing r-fault tolerant spanners and gives two basic 
constructions. For general stretch bounds A; > 3, we show how to construct r-fault tolerant k- 
spanners whose size is at most polynomially (in r) larger than spanners without fault tolerance, 



improving over the previous exponential dependency (on r) of |CLPR08]. Our main technique is 
oversampling failure sets, in order to handle many of them in one iteration. An interesting open 
question is to provide lower bounds on the size of the best r-fault tolerant fc-spanner; to the best 
of our knowledge, no such bounds are known other than those that apply even when r = 0. 

For k = 2 and unit edge lengths we design an 0(log n)-approximation algorithm (for all r). 



improving over the previous O(rlogn) factor of [ DK10(| and showing that the approximation ratio 



could be independent of the desired amount of fault tolerance r. Our main technique here is to 
design a new linear programming relaxation that includes the exponentially many knapsack-cover 



inequalities of [CFLPOO|. We also provided a distributed version of the algorithm, and showed that 
when all edge costs are 1 the approximation can be improved to O(logA). An interesting open 
question is to improve this ratio to 0(log |i?|/|y|), which would match the approximation known 
for the non-fault tolerant version. 
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